Variational quantum Monte Carlo simulations with tensor-network states 
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We show that the formalism of tensor-network states, such as the matrix product states (MPS), 
can be used as a basis for variational quantum Monte Carlo simulations. Using a stochastic opti- 
mization method, we demonstrate the potential of this approach by explicit MPS calculations for 
the transverse Ising chain with up to N = 256 spins at criticality, using periodic boundary condi- 
tions and D x D matrices with D up to 48. The computational cost of our scheme formally scales 
as ND 3 , whereas standard MPS approaches and the related density matrix renormalization group 
method scale as ND 5 and ND 6 , respectively, for periodic systems. 
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Devising unbiased computational methods for corre- 
lated quantum many-body systems remains one of the 
greatest challenges in theoretical physics. Considerable 
progress has been made in recent years. Quantum Monte 
Carlo (QMC) methods with efficient loop-cluster updates 
[E & Q now enable simulations of certain classes of spin 
and boson hamiltonians on very large lattices — up to 
> 10 4 sites essentially in the ground state and consid- 
erably more at elevated temperatures. Modern projector 
QMC methods Q can also access large lattices. Both 
approaches are already contributing significantly to fore- 
front areas of condensed matter physics, e.g., studies of 
exotic quantum phase transitions in antiferromagnets H . 
However, due to "sign problems" [y, |7|, most fermion 
systems in more than one dimension and spin mod- 
els with frustrated interactions are intractable to QMC 
simulations. The density matrix renormalization group 
(DMRG) method [8j, [9( , on the other hand, can produce 
essentially exact results for one-dimensional fermion sys- 
tems and frustrated spins, including systems of a few cou- 
pled chains (ladders) [lfj. These calculations are often 
restricted to open boundary conditions, however, which 
sometimes can be problematic. A more severe limitation 
is the exponential scaling in the computational complex- 
ity for systems with two or more dimensions [Tl| . 

The underlying reason for the problems with DMRG 
in higher dimensions has recently been identified as the 
inability of matrix product states (MPS), which are pro- 
duced by the DMRG method [l2j], to properly account 
for entanglement in dimensions higher than one [l3j |. In 
order to overcome this limitation, a generalization of the 
MPS was proposed — the projected-entangled pair states 
(PEPS) 11-411 - These states are based on tensor-product 
networks [151 ] , which are contracted using an approximate 
scheme. While this approach is very promising, practical 
applications are still hampered by the severe increase of 
the computational effort with the size D of the tensors in 
two dimensions. The scaling is typically ~ D 12 , and cal- 
culations are therefore currently restricted to very small 



UtI EH . Developing schemes with a more 
favorable D scaling is therefore a high priority. 

In principle MPS and PEPS can be used in variational 
QMC calculations. Sampling the physical states, instead 
of contracting the tensor network over those indices, for- 
mally reduces the scaling in D [191 ]. In practice, it is not 
clear how much can be achieved this way, however. An 
efficient method is required to optimize tensors with hun- 
dreds or thousands of independent parameters, based on 
noisy Monte Carlo estimates of the energy and its deriva- 
tives. In this Letter we demonstrate that such a program 
is actually feasible. We develo p a method based on a 
stochastic optimization scheme [2 01 ] which requires only 
the first energy derivatives. Here we focus on MPS for 
simplicity, but the scheme can be applied to more generic 
tensor networks, e.g., PEPS, as well. We test the method 
on the Ising chain in a transverse external field, 
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where a x and a z are the standard Pauli matrices. This 
system undergoes a quantum phase transition from a 
ground state with long-range Ising order in the z direc- 
tion for h < 1 to a state with disordered z components 
when h > 1. We here consider exclusively the computa- 
tionally most challenging h = 1 critical point. 

For a periodic chain, a translationally invariant matrix- 
product state with momentum k = is of the form [121 ] 
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■A(s N )}\si,s 2 ,...,s N ), 
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where the spins Sj = ±1 are the eigenvalues of of and 
A(zLl) are two D x D matrices (for a non-translationally 
invariant system the matrices would be site dependent). 
We here take the matrices to be real and symmetric, 
which, from properties of the trace, corresponds to a 
Si — > Sjv-j+i reflection symmetric state. The ground 
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state should also be invariant with respect to spin inver- 
sion; Si — y —Si for all i. A sufficient condition for this is 
that A(±l) are related by a transformation U such that 
U~ l A{l)U = A(-l) and U- l A{-l)U = A(l), which im- 
plies U 2 = I (the identity matrix). For simplicity, and 
because of indications that a greater flexibility of the ma- 
trices is advantageous for the optimization, we here only 
enforce the weaker condition that A(l) and A{— 1) have 
identical eigenvalues, using a scheme discussed below. 

Our goal is to find the matrix elements afj, s = ±1, 
that minimize the MPS energy E = (if). Denoting the 
wave function coefficient for state \S) = |si, . . . , sjv) 



W(S)=Ti{A( Sl )A(s 2 )---A(s N )}, 



(3) 



the energy, for given matrices A(±l), can be written in 
the form appropriate for Monte Carlo sampling; 

E=^W 2 (S)E(S), Z = ]TVF 2 (S), (4) 



where E(S) is the estimator 



(5) 



The energy can be evaluated using importance sampling 
of the spin configurations according to the weight W 2 (S); 
E = (E(S)). Our scheme also requires the derivatives of 
the energy with respect to the matrix elements; 
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where we have defined 



A? 



1 dW(S) 



11 W(S) da s l3 
Introducing the matrices 
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B(m) = A(s m+1 ) ■ ■ ■ A(s N )A( Sl ) ■ ■ ■ A(s m -i), (8) 
the derivative of the weight © is 



dW{S) 
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[Bijim) +B ji (m)}5. 
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We sample the states by generating successive config- 
urations from a stored S by single-spin flips; s m — > — s m . 
We denote the new tentative configuration S' m . Vis- 
iting the spins sequentially; m = 1,2, ...,N, we flip 
them according to the Metropolis probability; Pfu p = 
min[W 2 {S' m )/W 2 (S),l]. To evaluate P flip , we use the 
cyclic property of the trace and write the new coef- 
ficient as W(S' m ) = Tr{A(-s m )B{m)}. Further, we 
write the matrix B(m) in Eq. © as a product of left 
and right matrices B(m) = L(m + l)R(m — 1), where 



L(m) = A(s m ) ■ ■ ■ A(sm) and R(m) = A(si) ■ ■ ■ A(s m ). 
We also define L(N + 1) = R(Q) = I. Before starting 
the updating process we calculate and store the left ma- 
trices L(2), . . . , L(N), based on the initial spin configu- 
ration (random or from a previous run). Each successive 
spin-flip attempt then requires only one matrix multi- 
plication, and another for advancing the right matrix; 
R(m) = R(m — l)A(s m ). Since L(m) is no longer needed 
at this stage we store R(m) in its place for future use. 

Diagonal quantities, e.g., the Ising part of the energy, 
E z = — J2i °f°f+ij can be simply obtained by averaging 
the appropriate spin correlations in the stored state \S). 
To calculate off-diagonal quantities, ratios W(S')/W(S) 
are needed. After a full sweep of spin updates, all the 
matrices R{m) have been generated and stored. We can 
use them to speedily measure the off-diagonal energy 
E x — h J^i ( cr i~ + °i~) j the estimator of which is 
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as well as the derivatives ((9|). To evaluate the sums, we 
now traverse the system from m = N to 1, and in the 
process generate the left matrices L{m) and store them 
in the place of R(m). Once this process is completed we 
again have what we need to carry out an updating sweep 
in the manner described above. A full updating sweep, 
including measurements, thus requires 4iV matrix multi- 
plications (plus operations which have a lower scaling in 
D ), giving a formal scaling ND 3 of the algorithm. 

Carrying out successive simulations with fixed matrices 
A(±l), the energy and derivatives obtained on the basis 
of some number F of spin-flip sweeps (referred to as one 
simulation bin) are used to update the matrix elements 
with j > i according to (and subsequently = afj) [20| : 



S(k) 



signidE/datA (11) 



Here rfj € [0, 1) is random and 5(k) is the maximum 
change, which decreases as a function of a counter k — 
0, 1, ... Thus, instead of moving in the direction of the ap- 
proximately evaluated gradient, as in standard stochastic 
optimization 2l|,|22j, each parameter is changed indepen- 
dently, using the "correct" sign but with a random and 
well bounded magnitude for the step. This results in a 
very stable optimization ideally suited for problems with 
large numbers of parameters. For the gradual reduction 
of S, we here use a geometric form; S = SoQ k , with, typ- 
ically, Q — 0.9 — 0.95, but other forms also work well, 
e.g., S = 5ok~ a , with a £ [1/2, 1]. For each k, we com- 
plete a number, G, of bins, each followed by updates of 
the matrix elements. The number of sweeps per bin, F, 
as well as G are increased with k. The rationale behind 
increasing F(k) is that, as we approach the energy min- 
imum, the derivatives will become smaller and require 
more sampling in order not to be dominated by noise 
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FIG. 1: (Color online) Main panel: Convergence of the energy 
per site of a 16-site system at h — 1, using D = 8 and starting 
from random matrices. The cooling parameters were Q = 0.9, 
So = 0.05, Go = 10, F = 100. Inset: The later stages of the 
simulation on a more detailed scale, and a comparison with a 
run which started from a converged D = 6 calculation (lower 
curve; red symbols); here So = 0.005, Go = 5, Fo = 50. 

[2^ . Increasing G leads effectively to a slower "cooling" 
rate. We typically use a linear dependence in both cases; 
F = Fok, G = Gok. We output the energy and its sta- 
tistical error computed on the basis of the G bins before 
each increment of k. Since F and G increase with k, the 
error bars will decrease. For a sufficiently long run, if 
the cooling is slow enough, the calculated En{k) should 
approach the optimal energy for a given matrix size D. 

As we already mentioned, we wish to enforce the prop- 
erty that A(—l) and A(l) have the same eigenvalues. We 
do this after each adjustment of the matrix elements, by 
diagonalizing both matrices and averaging their eigenval- 
ues. The averaged diagonal matrix is then transformed 
back using the diagonalizing matrices for the original 
A(±l). If we do not carry out this diagonalization step 
we still in practice do obtain matrices with approximately 
equal eigenvalue spectra. However, enforcing this condi- 
tion exactly seems to have favorable effects on the abil- 
ity of the optimization method to quickly converge to a 
spin-inversion invariant ground state. We normalize the 
matrices so that the largest element \a"j\ = 1 . 

It should be noted that the optimal matrices are not 
unique — there is a huge degeneracy in terms of simul- 
taneous transformations of j4(±1) that leave the trace 
invariant. This may also be an advantage in the opti- 
mization, as we are not trying to locate a point, but only 
reach some large hypersurface in parameter space. 

In Fig. Q] we show an example of the convergence of the 
optimization for a 16-site chain, using D = 8 and start- 
ing from random A(±l). The initial maximum parame- 
ter shift was 5q = 0.05. We compare with a run which 
started from matrices resulting from a calculation with 
D = 6 (with the new matrix elements in the larger ma- 
trices generated at random in the range [—So, So]), which 
allows for a smaller initial step So = 0.005. The latter 
calculation produces a marginally lower energy, showing 




FIG. 2: (Color online) Relative error of the energy and 
squared magnetization versus the matrix dimension D. 

that the cooling rate in the former case was slightly too 
fast — cooling slower we obtain consistent results. 

It is useful to start the optimization for some TV and D 
from A(±l) previously obtained for a smaller N and the 
same D, or the same N and smaller D. Another good 
strategy is to first do a short run with a large So ~ 0.1 
to achieve convergence only approximately, and then to 
restart the calculation with a smaller S [but much larger 
than the smallest S(k) reached previously]. After a few 
such restarts there are typically no further changes in the 
minimum energy reached. 

We do not claim that the cooling protocol presented 
above is optimal; further improvements could potentially 
lead to considerable efficiency gains. However, even as it 
stands now the scheme performs remarkably well. 

We now compare simulation results with the exact so- 
lution [24| of the critical transverse Ising chain, we con- 
sider the energy as well as the squared magnetization; 
M 2 = (J2i a i) 2 /N 2 - The convergence with D is illus- 
trated in Fig. [21 In the case of the energy, a desired rel- 
ative accuracy requires a D which eventually approaches 
a constant for large N. The squared magnetization is di- 
rectly related to the long-distance physics, however, and 
our results are consistent with the expectation that D 
has to grow as some power, D ~ N a , to achieve a given 
relative accuracy. From Fig. [2] we obtain, roughly, a in 
the range 0.5 — 1. The statistical errors in Fig. [2] are 
smaller than the symbols. The slight jaggedness of the 
curves for L = 256, in particular, reflects the fact that it 
is not possible in practice to reach the optimum exactly. 
Nevertheless, it is clear from these tests that our scheme 
allows for a systematic approach to the ground state. 

In Table [J we show results for the largest D considered 
for each N. The statistical errors of the energies are not 
shown, but are at most ±2 in the last digit (i.e., 10~ 8 ). 
For a variational wave function that can exactly repro- 
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TABLE I: Variational QMC results for the critical transverse 
Ising chain compared with the exact solution [24|. The error 
bars of the MPS energies are » 10 -8 or smaller. 

N D E/N (MPS) E/N (ex.) M 2 (MPS) M 2 (ex.) 

0.522332 
0.440795 
0.371455 
0.312752 
0.263192 
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16 12 -1.27528715 -1.27528715 0.52233(2) 

32 16 -1.27375097 -1.27375102 0.44076(5) 

64 20 -1.27336736 -1.27336739 0.37151(9) 

128 32 -1.27327145 -1.27327150 0.3126(1) 

256 48 -1.27324731 -1.27324753 0.2630(2) 



duce the exact ground state, which should be the case 
here for D — * oo, the fluctuations in the energy should 
vanish. We indeed observe a strong reduction of the sta- 
tistical errors of Eo{k) with increasing D, as reflected 
in the very small error bars. For N > 16, there is still 
some small discrepancies beyond statistical errors, which 
we believe arc not due to the finite D but incomplete 
optimization. The ability of a stochastic scheme to reach 
so close to the optimum is still quite remarkable. 

We have also carried out simulations with general non- 
symmetric matrices. In order to strictly enforce the lat- 
tice reflection and spin-inversion symmetries, we then use 
a wave function with a trace of four different matrix prod- 
ucts related to each other by these symmetries, i.e., 

W(S) = Tt{P(S) + P(S R ) + P(-S) + P(-S R )}, (12) 

where P(S) = A(si) • ■ -A(sn), and —S and Sr are ob- 
tained by, respectively, spin-inverting and reflecting the 
configuration S. For given D, this wave function has a 
lower optimum energy than one with a single product 
P(S). The energy is also better than for the symmet- 
ric matrices discussed above. The computational effort 
is higher by a factor of 4, however, and the optimization 
converges slightly slower. 

In summary, we have demonstrated that the vari- 
ational QMC approach can be successfully combined 
with the versatility of tensor-network states, for a sign- 
problem free and systematically refinable (through the 
tensor dimension D) generic many-body method. The 
scaling with the matrix size D in the case of MPS for 
periodic chains is formally reduced from D 5 [25| to D 3 , 
and similar reductions are possible with tensor networks 
in higher dimensions [l^l ■ There may of course be some 
further non-obvious D dependence in the convergence 
properties of the sampling and optimization schemes — 
its is clear that stochastic optimization will be difficult 
in practice for D much larger than the maximum D = 48 
considered here. It should be noted, however, that other 
MPS schemes, as well as DMRG, also have convergence 
issues beyond the formal scaling in N and D. 

At the late stag es of completing this work we became 
aware of Ref. [26J, where a different QMC approach is 
proposed in the same spirit and applied to "string" states. 

AWS would like to thank Y.-J. Kao for stimulating 
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